Identifying Vendors via CVE Description Feeds

Finetuning GPT2 for Classification

Common Vulnerabilities and Exposures (CVE) is a list of entries—each containing an identification number, a description, and at least one public reference—for publicly known cybersecurity vulnerabilities. This list is published in the National Vulnerability Database (NVD) and is maintained by NIST.

Currently when new CVEs are discovered and published on the NVD, they typically contain a paragraph of text--the 'description'--that describes the vulnerability, for example for CVE-2018-17189:

In Apache HTTP server versions 2.4.37 and prior, by sending request bodies in a slow loris way to plain resources, the h2 stream for that request unnecessarily occupied a server thread cleaning up that incoming data. This affects only HTTP/2 (mod_http2) connections.

NVD takes 3-5 business days to fill in the 'vendor' column with info--in this case the vendor would be apache.

This exercise to try and see if it is possible to derive the vendor by finetuning the GPT2 model to read the description text. This would allow automated classification of new CVEs without having to wait on NVD to supplement the details.

Main idea: Since GPT2 is a decoder transformer, the last token of the input sequence is used to make predictions about the next token that should follow the input. This means that the last token of the input sequence contains all the information needed in the prediction. With this in mind we can use that information to make a prediction in a classification task instead of generation task.

Previously, a LSTM model was used for this classification task and it had a validation accuracy of 93% on 20 vendors. Using GPT, the goal is to increase the number of vendors (classes) whilst maintain high accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ANLP_Project_GPT2_Finetune_Classification.ipynb		ANLP_Project_GPT2_Finetune_Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ANLP_Project_GPT2_Finetune_Classification.ipynb

ANLP_Project_GPT2_Finetune_Classification.ipynb

README.md

README.md

Repository files navigation

Identifying Vendors via CVE Description Feeds

Finetuning GPT2 for Classification

About

Releases

Packages

Languages

SzeKiatTan/nlp-cve-vendor-classification-gpt2

Folders and files

Latest commit

History

ANLP_Project_GPT2_Finetune_Classification.ipynb

ANLP_Project_GPT2_Finetune_Classification.ipynb

README.md

README.md

Repository files navigation

Identifying Vendors via CVE Description Feeds

Finetuning GPT2 for Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages